From paragraph to graph: latent semantic analysis for information visualization.

نویسندگان

  • Thomas K Landauer
  • Darrell Laham
  • Marcia Derr
چکیده

Most techniques for relating textual information rely on intellectually created links such as author-chosen keywords and titles, authority indexing terms, or bibliographic citations. Similarity of the semantic content of whole documents, rather than just titles, abstracts, or overlap of keywords, offers an attractive alternative. Latent semantic analysis provides an effective dimension reduction method for the purpose that reflects synonymy and the sense of arbitrary word combinations. However, latent semantic analysis correlations with human text-to-text similarity judgments are often empirically highest at approximately 300 dimensions. Thus, two- or three-dimensional visualizations are severely limited in what they can show, and the first and/or second automatically discovered principal component, or any three such for that matter, rarely capture all of the relations that might be of interest. It is our conjecture that linguistic meaning is intrinsically and irreducibly very high dimensional. Thus, some method to explore a high dimensional similarity space is needed. But the 2.7 x 10(7) projections and infinite rotations of, for example, a 300-dimensional pattern are impossible to examine. We suggest, however, that the use of a high dimensional dynamic viewer with an effective projection pursuit routine and user control, coupled with the exquisite abilities of the human visual system to extract information about objects and from moving patterns, can often succeed in discovering multiple revealing views that are missed by current computational algorithms. We show some examples of the use of latent semantic analysis to support such visualizations and offer views on future needs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification of Documents in E-Learning Using Multidimensional Latent Semantic Analysis

In this paper we consider the problem of dimensionality reduction techniques. Two techniques such as Independent Component analysis (ICA) and multidimensional latent semantic analysis (MDLSA) are proposed. A new document analysis method named multidimensional latent semantic analysis (MDLSA) which resolves the problem of in-depth document analysis, mines local information from a document effici...

متن کامل

Query expansion based on relevance feedback and latent semantic analysis

Web search engines are one of the most popular tools on the Internet which are widely-used by expert and novice users. Constructing an adequate query which represents the best specification of users’ information need to the search engine is an important concern of web users. Query expansion is a way to reduce this concern and increase user satisfaction. In this paper, a new method of query expa...

متن کامل

What’s up, Doc? Visualizing information in text documents for better readability and understanding

The summary and key ideas of a technical paper is textual in the form of keywords, abstracts, introductions and conclusions. More can certainly be done to improve the presentation of information in long papers, especially ones with long sentences. We present representations of a technical paper at three granularities the document level, the paragraph level, and the sentence level, to aid in the...

متن کامل

Model of Making Decisions during an Information Search Task

This paper presents a cognitive computational model of the way people read a paragraph with the task of quickly deciding whether it is related or not to a given goal. In particular, the model attempts to predict the time at which participants would decide to stop reading the paragraph because they have enough information to make their decision. Our model makes predictions at the level of words ...

متن کامل

Determining curricular coverage of student contributions to an online discourse environment through the use of latent semantic analysis and term clouds

This paper presents a new approach to mapping student contributions to curriculum guidelines through the use of Latent Semantic Analysis and information visualization techniques. A new information visualization technique – differential term clouds – is introduced as a means to make clear changes in semantic fields over time.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Proceedings of the National Academy of Sciences of the United States of America

دوره 101 Suppl 1  شماره 

صفحات  -

تاریخ انتشار 2004